Language trees, zipping and error estimation
نویسنده
چکیده
A method was recently proposed to estimate distances between a pair of given texts. The distance estimation appeared to be reliable enough to infer a phylogenic tree of languages, even though no error estimation has been provided. This essay reviews the method and explains its application for inferring phylogeny on a collection of heterogeneous texts. An approach for estimating the confidence of the classification is introduced and the results are discussed. 1 Background and method A simple method for estimating a “distance” between a pair of given text was recently described in a paper [1], which attracted attention and criticism [2]. Its simplicity and the lack of assumptions on the nature of texts, however, make it an appealing technique for evaluating distance matrices. A (symmetric) distance matrix contains an entry for each pair of items, i.e., texts from different sources. The authors propose a distance DAB reported in the appendix and show ∗Questa è la tesina conclusiva del primo anno della scuola SAFI, scritta per il corso “Linguaggio ed Evoluzione” tenuto dai proff. Cavalli-Sforza e Wang.
منابع مشابه
Language trees and zipping.
In this Letter we present a very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series. Based on data-compression techniques, its key point is the computation of a suitable measure of the remoteness of two bodies of knowledge. We present the implementation of the method to linguistic motivated problems, featuring highly acc...
متن کاملInvestigation of the Allometric Models in Estimation of Poplar (Populus deltoides) Height
One of the most important issues in forest biometrics is the use of allometric functions to estimate the tree height by using diameter-height models. Measuring the total height of trees is usually a complex and time-consuming process. In allometric functions, the diameter is measured directly but the height of the tree is an estimate of an allometric model, which will be more accurate if the cr...
متن کاملA Study on the Accuracy and Precision of Estimation of the Number, Basal Area and Standing Trees Volume per Hectare Using of some Sampling Methods in Forests of NavAsalem
The present study aimed to investigate the accuracy and precision estimation of the number, basal area and volume of the standing trees by methods of random and systematic random sampling in the forests of West Guilan. The cost or inventory time was determined using the criteria (E%2 × T). Inventory was carried out by complete sampling (census) in an area of 52 hectares. The study area (sect...
متن کاملComment on"Language Trees and Zipping"arXiv:cond-mat/0108530
every encoding has priori information if the encoding represents any semantic information of the unverse or object.Encoding means mapping from the unverse to the string or strings of digits. The semantic here is used in the model-theoretic sense or denotation of the object.if encoding or strings of symbols is the adequate and true mapping of model or object,and the mapping is recursive or compu...
متن کاملDevelopment of an allometric model to estimate above-ground biomass of forests using MLPNN algorithm, case study: Hyrcanian forests of Iran
This research develops an allometric model for estimation of biomass based on the height and DBH of trees in the Hyrcanian forests of Iran. An accurate allometric model reduces the uncertainty of allometric equation in biomass estimation using radar images. In this study, 317 trees were selected randomly from the 4 different dominant tree species for the development of an allometric model cover...
متن کامل